Primary analyses
Comparing BERT and ELMo distances
df_distances %>%
ggplot(aes(x = distance_elmo,
y = distance_bert,
color = same)) +
geom_point(alpha = .5) +
theme_minimal() +
geom_smooth(method = "lm") +
facet_grid(~ambiguity_type)
## `geom_smooth()` using formula 'y ~ x'
cor.test(df_distances$distance_bert,
df_distances$distance_elmo,
method = 'spearman')
##
## Spearman's rank correlation rho
##
## data: df_distances$distance_bert and df_distances$distance_elmo
## S = 23277112, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.5397715
Is cosine distance larger for usages across senses?
First, we ask whether the existence of a sense boundary explains significant variance in the cosine distance between two words.
In this analysis, we add a random effect for the model being used to assess cosine distance.
df_distances_reshaped = df_distances %>%
mutate(elmo = distance_elmo,
bert = distance_bert) %>%
pivot_longer(c(elmo, bert), names_to = "model",
values_to = "distance")
model_same = lmer(data = df_distances_reshaped,
distance ~ same +
Class +
(1 | model) +
(1 + same | word),
control=lmerControl(optimizer="bobyqa"),
REML=FALSE)
model_reduced = lmer(data = df_distances_reshaped,
distance ~
Class +
(1 | model) +
(1 + same | word),
control=lmerControl(optimizer="bobyqa"),
REML=FALSE)
summary(model_same)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: distance ~ same + Class + (1 | model) + (1 + same | word)
## Data: df_distances_reshaped
## Control: lmerControl(optimizer = "bobyqa")
##
## AIC BIC logLik deviance df.resid
## -2635.6 -2593.9 1325.8 -2651.6 1336
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.6646 -0.5966 -0.0210 0.4393 5.1259
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## word (Intercept) 0.002891 0.05377
## sameTRUE 0.001107 0.03328 -0.99
## model (Intercept) 0.007822 0.08844
## Residual 0.007112 0.08433
## Number of obs: 1344, groups: word, 112; model, 2
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.213266 0.062853 3.393
## sameTRUE -0.099220 0.005805 -17.091
## ClassV -0.004643 0.009522 -0.488
##
## Correlation of Fixed Effects:
## (Intr) smTRUE
## sameTRUE -0.065
## ClassV -0.038 0.000
anova(model_same, model_reduced)
## Data: df_distances_reshaped
## Models:
## model_reduced: distance ~ Class + (1 | model) + (1 + same | word)
## model_same: distance ~ same + Class + (1 | model) + (1 + same | word)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model_reduced 7 -2493.8 -2457.4 1253.9 -2507.8
## model_same 8 -2635.6 -2593.9 1325.8 -2651.6 143.72 1 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We find that it does. We can illustrate this visually as well:
df_distances_reshaped %>%
ggplot(aes(x = distance,
fill = same,
y = model)) +
geom_density_ridges2(alpha = .6) +
theme_minimal() +
labs(x = "Cosine Distance",
y = "Model") +
facet_wrap(~ambiguity_type)+
theme(axis.title = element_text(size=rel(2)),
axis.text = element_text(size = rel(2)),
legend.text = element_text(size = rel(2)),
legend.title = element_text(size = rel(2)),
strip.text.x = element_text(size = rel(2)))
## Picking joint bandwidth of 0.0213
## Picking joint bandwidth of 0.0208
ggsave("../../Figures/cosine_distances.pdf", dpi = 300)
## Saving 7 x 5 in image
## Picking joint bandwidth of 0.0213
## Picking joint bandwidth of 0.0208
Does cosine distance vary as a function of the type of ambiguity?
Above, we saw that the cosine distance between two usages varied as a function of whether those usages belonged to the same sense.
We also show that a model with both condition and same does not explain more variance than a model with only same; we don't really expect it to, given that same sense pairs are included here.
model_both = lmer(data = df_distances_reshaped,
distance ~ same +
Class +
ambiguity_type +
(1 | model) +
(1 + same | word),
control=lmerControl(optimizer="bobyqa"),
REML=FALSE)
## boundary (singular) fit: see ?isSingular
anova(model_both, model_same)
## Data: df_distances_reshaped
## Models:
## model_same: distance ~ same + Class + (1 | model) + (1 + same | word)
## model_both: distance ~ same + Class + ambiguity_type + (1 | model) + (1 +
## model_both: same | word)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model_same 8 -2635.6 -2593.9 1325.8 -2651.6
## model_both 9 -2634.7 -2587.8 1326.3 -2652.7 1.0846 1 0.2977
But there is also no significant interaction between condition and same (though it is marginal / trending)
model_interaction = lmer(data = df_distances_reshaped,
distance ~ ambiguity_type * same +
Class +
(1 | model) +
(1 + same | word),
control=lmerControl(optimizer="bobyqa"),
REML=FALSE)
## boundary (singular) fit: see ?isSingular
anova(model_both, model_interaction)
## Data: df_distances_reshaped
## Models:
## model_both: distance ~ same + Class + ambiguity_type + (1 | model) + (1 +
## model_both: same | word)
## model_interaction: distance ~ ambiguity_type * same + Class + (1 | model) + (1 +
## model_interaction: same | word)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model_both 9 -2634.7 -2587.8 1326.3 -2652.7
## model_interaction 10 -2634.8 -2582.8 1327.4 -2654.8 2.1907 1 0.1388
summary(model_interaction)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: distance ~ ambiguity_type * same + Class + (1 | model) + (1 +
## same | word)
## Data: df_distances_reshaped
## Control: lmerControl(optimizer = "bobyqa")
##
## AIC BIC logLik deviance df.resid
## -2634.8 -2582.8 1327.4 -2654.8 1334
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.7451 -0.6002 -0.0174 0.4487 5.0784
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## word (Intercept) 0.002893 0.05379
## sameTRUE 0.001078 0.03284 -1.00
## model (Intercept) 0.007822 0.08844
## Residual 0.007100 0.08426
## Number of obs: 1344, groups: word, 112; model, 2
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.216031 0.063353 3.410
## ambiguity_typePolysemy -0.003760 0.012308 -0.305
## sameTRUE -0.111194 0.009922 -11.207
## ClassV -0.005768 0.009564 -0.603
## ambiguity_typePolysemy:sameTRUE 0.018124 0.012207 1.485
##
## Correlation of Fixed Effects:
## (Intr) ambg_P smTRUE ClassV
## ambgty_typP -0.125
## sameTRUE -0.111 0.572
## ClassV -0.028 -0.077 0.000
## ambg_P:TRUE 0.090 -0.704 -0.813 0.000
## convergence code: 0
## boundary (singular) fit: see ?isSingular
We can also compare the model's predictions against the real values for cosine distance.
df_distances_reshaped$predictions = predict(model_interaction)
df_distances_reshaped %>%
ggplot(aes(x = predictions,
y = distance,
color = same,
shape = ambiguity_type)) +
geom_point(alpha = .4) +
facet_grid(~model) +
theme_minimal()